Computational Auditory Scene Analysis and Automatic Speech Recognition

نویسندگان

  • Arun Narayanan
  • DeLiang Wang
چکیده

The human auditory system is, in a way, an engineering marvel. It is able to do wonderful things that powerful modern machines find extremely difficult. For instance, our auditory system is able to follow the lyrics of a song when the input is a mixture of speech and musical accompaniments. Another example is a party situation. Usually there are multiple groups of people talking, with laughter, ambient music and other sound sources running in the background. The input our auditory system receives through the ears is a mixture of all these. In spite of such a complex input, we are able to selectively listen to an individual speaker, attend to the music in the background, and so on. In fact this ability of ‘segregation’ is so instinctive that we take it for granted without wondering about the complexity of the problem our auditory system solves. Colin Cherry, in the 1950s, coined the term ‘cocktail party problem’ while trying to describe how our auditory system functions in such an environment [12]. He did a series of experiments to study the factors that help humans perform this complex task [11]. A number of theories have been proposed since then to explain the observations made in those experiments [11,12,70]. Helmhotz had, in the mid-nineteenth century, reflected upon the complexity of this signal by using the example of a ball room setting [22]. He remarked that even though the signal is “complicated beyond conception,” our ears are able to “distinguish all the separate constituent parts of this confused whole.” So how does our auditory system solve the so-called cocktail party problem? Bregman tried to give a systematic account in his seminal 1990 book Auditory Scene Analysis [8]. He calls

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A computational auditory scene analysis system for speech segregation and robust speech recognition

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, ...

متن کامل

Signal Separation Motivated by Human Auditory Perception: Applications to Automatic Speech Recognition

The human auditory system uses a number of well-identified cues to segregate and separate individual sound sources in a complex acoustical environment. For example, researchers in auditory scene analysis have long identified cues such as common onset, correlated fluctuations in instantaneous amplitude and frequency, harmonicity, and common interaural time and amplitude differences as ways of id...

متن کامل

Challenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches

Understanding three simultaneous speeches is proposed as a challenge problem to foster arti cial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of sp...

متن کامل

On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis

What is the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this chapter I discuss different objectives used in CASA. I suggest as a main CASA goal th...

متن کامل

Auditory Scene Analysis: Computational Models

Human listeners have a remarkable ability to separate a complex mixture of sounds into discrete sources. The processes underlying this ability have been termed ‘auditory scene analysis’ (Bregman 1990; this volume). Recently, an interdisciplinary field known as ‘computational auditory scene analysis’ (CASA) has emerged which aims to develop computer systems that mimic this aspect of hearing (Ros...

متن کامل

16 Separation of Speech by Computational Auditory Scene Analysis

The term auditory scene analysis (ASA) refers to the ability of human listeners to form perceptual representations of the constituent sources in an acoustic mixture, as in the well-known ‘cocktail party’ effect. Accordingly, computational auditory scene analysis (CASA) is the field of study which attempts to replicate ASA in machines. Some CASA systems are closely modelled on the known stages o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012